Web Information Extraction: Tag Density and Keyword Approach
نویسندگان
چکیده
منابع مشابه
Keyword Extraction and Semantic Tag Prediction
Content on the web is often organized through user generated tags for intuitive search and retrieval. Such tags convey meta-information about the subject matter of the texts they represent. For this project, we applied machine learning (Bayesian co-occurrence, k-NN, SVM, NNS) to predict tags of StackExchange posts obtained from Kaggle: “Facebook Recruiting Keyword Extraction III.” Using our non...
متن کاملA New Approach for Web Information Extraction
With the exponentially growing amount of information available on the Internet, an effective technique for users to discern the useful information from the unnecessary information is urgently required. Cleaning web pages for web data extraction becomes critical for improving performance of information retrieval and information extraction. So, we investigate to remove various noise patterns in W...
متن کاملThe Web-OEM approach to Web information extraction
The enormous amount of information available through the World Wide Web requires the development of effective tools for extracting and summarizing relevant data from Web sources. In this article we present a data model for representing Web documents and an associated SQL-like query language. Our framework provides an easy-to-use and well-formalized method for automatic generation of wrappers ex...
متن کاملSummarization of Web Pages by Keyword Extraction and Sentence Vector
In this paper we are trying to propose a system that can run in parallel with the usual search engine to provide the user with unified and summarized information. Our system will relieve the user of manual accessing of each of the web links that is produced by the search result of a search engine. To implement such feature in the search process, here we propose a procedure that can identify the...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: International Journal of Computer Applications
سال: 2013
ISSN: 0975-8887
DOI: 10.5120/9981-4811